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I.  JMIROWCIIQH 


The  following  reports  on  research  performed  under  the  sponsorship  of 
AFOSR,  for  a  period  of  approximately  three  and  half  years.  This  research 
has  primarily  focused  on  the  following  three  aspects  of  fault-tolerant 
computing. 

1.1  Design  of  fault- tolerant  computers  using  read-only  memories  (ROMs) 
as  basic  building  blocks. 

1.2  Design  of  programmable  logic  arrays  (PLA's)  and  sequential  net¬ 
works  for  testability 

1.3  Design  of  fault- tolerant  multiprocessor  network  architectures. 
During  this  initial  period  of  the  grant,  major  emphasis  was  placed  on 

the  first  two  topics;  research  on  the  third  topic  has  more  reoently  been  in¬ 
itiated  and  is  continuing  here  at  the  University  of  Massachusetts.  In  this 
report,  we  present  highlights  of  these  research  activities,  as  well  as  the 
pertinent  research  results. 

This  report  is  organized  into  the  following  main  sections:  In  Section 

II,  a  summary  of  all  important  results  is  given.  Following  this,  in  Section 

III,  a  list  is  provided  of  all  associated  personnel  linked  to  the  research 
activities.  Finally,  is  a  complete  listing  of  publications  that  resulted 

from  the  research  supported  by  the  grant. 


Chief,  Technical 
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II.  SDMM4BT  OP  RESEARCH  RESDLTS 

This  seotion  is  divided  into  three  subseotions,  2.1,  2.2  and  2.3. , 
which  discuss  the  results  that  correspond  to  1.1,  1.2  and  1.3,  above, 
respectively. 

2.1  Initial  research  was  primarily  concerned  with  the  development  of 
techniques  that  will  enhance  the  use  of  ROM  as  a  basic  building  block  in 
fault-tolerant  computers.  The  predominant  motivation  here  is  that 
fault-tolerant  techniques  can,  then,  more  easily  be  incorporated  in  ROM- 
based  logic.  The  following  results  were  formulated: 

To  begin  with,  a  decomposition  technique  was  proposed  that  allows  for 
any  multiple  output  funotion  to  be  decomposed  into  subfunctions.  This  was 
achieved  by  using  Galois  switching  theory  that  was  developed  earlier  by  the 
author.  This  technique  provided  a  new  and  practical  method  to  decompose  a 
large  RGM  into  an  interconnection  of  smaller  ROM's.  The  effectiveness  of 
our  technique  was  evaluated  for  specialized  arithmetic  functions  suah  as 
multipliers.  It  was  shown  that  for  a  large  class  of  functions;  decomposi¬ 
tions  are  indeed  possible,  resulting  in  a  drastic  reduction  in  logic. 

Seoondly,  a  new  class  of  codes,  specially  suited  for  ROM-based  logic, 
was  developed,  possessing  the  following  properties: 

Error  oorrectlng/de testing  codes  that  are  effective  against  both  random 
and  unidirectional  errors  are  useful  in  providing  protection  against  both 
permanent  and  transient  faults.  Therefore,  what  has  been  developed  is  a  new 
error-control  strategy.  Of  particular  concern  here  is  when  the  likelihood 
of  both  random  and  unidirectional  errors  exists.  Transient  faults  are 
likely  to  cause  random  errors,  whereas  permanent  faults  oause  either  random 
or  unidirectional  errors,  depending  on  the  nature  of  the  faults.  As  an  ex¬ 
ample  of  unidirectional  errors,  consider  a  power  supply  or  a  stuok-fault  in 


A.  - 


AA 


a  aerial  bus.  This  can  result  in  changing  the  reoeived  work  to  all  0  or  all 

1. 

Here,  a  olaas  of  t-error  correcting  and  aultiple  unidirectional  error 
detecting  systematic  codes  was  developed.  These  codes  are  significantly  more 
efficient  than  the  earlier  oodes.  The  efficiency  of  these  codes  approaches 
the  efficiency  of  the  BCH  oodes,  asymptotically.  Furthermore,  it  was  shown 
that  these  oodes  can  be  easily  decoded.  Also  we  have  developed  a 
generalization  of  Berger  codes  over  Z^;  these  new  oodes  are  also  shown  to  be 
optimal. 

These  coding  techniques  have  been  used  to  propose  certain 
fault- tolerant  ROM-based  logic  design  methods. 

2.2  Next,  research  was  carried  out  in  the  area  of  design  for  tes¬ 
tability  for  programmable  logic  arrays  (PLA's)  and  sequential  networks. 

PLAs  are  increasingly  replacing  the  custom  logic  elements  -  insofar  as  the 
design  of  a  wide  variety  of  digital  systems  is  concerned.  The  chief  ad¬ 
vantage  of  PLA's  is  their  regularity;  this  results  in  both  simpler  designs 
and  fast  turn-around  time.  However,  the  increasing  complexity  of  PLA  cir¬ 
cuits  has  meant  that  the  traditional  atuok-at  fault  model  has  become 
inadequate. 

Therefore,  recently  growing  attention  has  been  focused  on  other  more 
oomplex  types  of  faults.  An  important  class  of  such  faults  is  that  of 
bridging  faults.  In  fact  some  of  the  reoent  studies  have  shown  that  bridg¬ 
ing  faults  (short  oirouits)  are  aotually  the  cause  of  many  failures. 

Here,  the  effects  of  undeteotable  bridging  faults  in  programmable  logic 
arrays  was  studied.  Furthermore,  it  was  shown  that  an  undetectable  bridging 
fault  oan  invalidate  a  crosspoint  fault  test  set.  A  design  of  PLA's  was 


also  presented  in  which  one  oan  detect  all  single  bridging  faults  by  apply¬ 
ing  at  most  (m+2)  tests  (where  ■  is  the  number  of  product  lines).  The 
proposed  design  may  require  at  most  two  extra  Inputs. 

The  design  of  sequential  networks  that  can  facilitate  fault  detection 
is  becoming  an  ever-increasing  concern.  Here,  a  new  fault-detecting  design 
of  sequential  networks  was  developed.  Specifically,  this  design  allows  for 
the  use  of  any  arbitrary  number  of  inputs.  This,  therefore,  can  be  inter¬ 
preted  as  a  generalization  of  scan- in/a can-out  design.  The  new  design  was 
evaluated  in  the  framework  of  checking  sequences;  it  was  shown  to  be 
optimal.  Additionally,  the  new  design  was  demonstrated  to  be  cost  effective 
from  the  point  of  view  of  hardware  design  cost.  Specifically,  only  a  small 
nunber  of  extra  gates  are  required  to  incorporate  the  testability  features. 

2.3  The  later  phase  of  the  research  has  focused  on  one  of  the  most  vi¬ 
tal  and  important  areas  of  fault- tolerant  computing  research  today  -  that 
is,  how  fault- toleranoe  oan  be  incorporated  into  multiprocessor  network 
architectures.  This  particular  problem  has  taken  on  a  new  dimension  in  the 
context  of  VLSI  whioh,  for  the  first  time,  makes  it  possible  to  interconnect 
a  large  nuaber  of  computing  elements  together  so  as  to  form  an  integrated 
system. 

The  author  baa  developed  several  new  fault- tolerant  interconnection  ar¬ 
chitectures  that  allow  fault- tolerance  to  be  incorporated  as  an  i  nt-.eyyai 
part  of  the  design.  Also  what  is  being  developed  is  a  systematic  design 
methodology  that  can  incorporate  various  f ault-toleranoe  features  directly 
into  such  network  systems.  A  goal  of  this  research,  then  ,  is  to  develop  a 
sound  framework  through  which  pertinent  design  considerations  can  be  ex¬ 
pressed  quantitatively;  thus,  the  basis  for  exploring  newer  fault- tolerant 


architectures  is  provided.  This  is  formulated  by  using  the  system  intercon¬ 
nection  structure  as  the  fundamental  design  component  for  the  achievement  of 
a  wide  variety  of  fault- tolerant  objectives. 

Interconnection  structure  design  for  closely-coupled  integrated 
multi-processor  systems  differs  in  a  significant  way  from  that  used  in  other 
computer  networks.  This  is  chiefly  due  to  the  factors  of  the  speed,  size 
and  computational  environment  constraints  that  are  unique  to  these  systems. 
Specifically,  it  is  becoming  increasingly  recognized  that  for  a  large 
multi-processor  system,  the  network  topology  must  possess  the  properties  of 
low  interconnection  complexity,  simple  routing,  dynamic  reconfigurability, 
fault-tolerance,  and  the  like. 

All  of  these  requirements  can  only  be  achieved  if  the  system  intercon¬ 
nection  architecture  is  designed  using  a  systematic  design  methodology  that 
Incorporates  these  requirements  as  an  integral  part  of  the  design  (and  not 
an  an  'after  thought').  For  example,  in  order  to  simplify  routing  overhead, 
the  network  must  be  able  to  support  algorithmic-based  routing,  so  that  the 
nodes  can  be  relieved  of  having  to  maintain  routing  tables,  directories, 
eto. ,  which  can  be  unwieldy  and  costly  for  large  systems.  Dynamic  recon¬ 
figurability  requires  that  the  system  interconnection  efficiently  admit 
different  logical  structures,  such  as  binary  tree,  linear  array,  etc. 

Finally,  fault- tolerance  should  be  incorporated  at  the  system  intercon¬ 
nection  level.  Thus,  graoeful  degradation  can  be  provided  for  so  that  full 
connectivity  among  all  of  the  surviving  elements  is  maintained,  in  spite  of 
fault-induoed  changes. 

In  implementing  fault-toleranoe,  the  effects  of  link,  node  and  subsys¬ 
tem  faults  must  all  be  taken  into  account  (where  a  subsystem  may  contain  a 
duster  of  nodes).  Importantly,  the  effectiveness  of  any  fault-tolerant 


technique  depends  greatly  on  the  effectiveness  of  the  testing  and  diagnosis 
capabilities  provided  to  detect  and  looate  the  faulty  link(s),  node ( s ) ,  and 
aubsystem(s).  The  following  elaborates  more  fully  on  these  considerations. 

Another  area  of  research  is  the  development  of  architectures,  built  by 
interconnecting  a  large  nwber  of  processing  elements  on  a  single  chip  or 
wafer.  Two  different  important  problems,  related  to  suoh  VLSI  processor  ar¬ 
rays  are  the  focus  of  this  research;  they  are  fault- tolerance,  and  the 
development  of  techniques  that  better  utilize  the  inherent  computational 
capabilities  of  these  arrays. 

Fault-toleranoe  in  these  VLSI  processor  arrays  is  of  real  practical 
significance;  it  provides  for  much-needed  reliability  improvement,  as  well 
as  for  yield  enchancement.  Therefore,  what  is  being  studied,  first,  is  to 
identify  those  underlying  concepts  and  relationships  of  fault-toleranoe  at 
work  in  these  arrays.  These  precepts  are  useful  to  then  formulate  certain 
techniques  that  will  incorporate  fault-tolerance  Integrally  into  the  design. 
Also  being  developed  are  models  that  evaluate  bow  yield  enhancement  may  be 
achieved  by  certain  new  fault-tolerant  techniques. 

Secondly,  what  has  been  developed  is  a  novel  approach  that  uses  these 
arrays  for  general  computation,  an  approaoh  which  is  based  on  the  mapping  of 
certain  data  flow  graphs  directly  onto  these  arrays.  The  overall  effective¬ 
ness  of  the  approach  is  augmented  by  indicated  research  developing  detailed 
architectural  supports  for  implementation  and  optimization  of  the  design  for 
cost-performance  improvements. 
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The  accomplished  research  has  been  able  to  cover  a  broad  spectrum  of 
important  areas  in  fault-tolerant  computing.  Also  continuing  actively  is 
research  in  the  area  of  fault- tolerant  network  architectures  in  the  area  of 
multiprocessors  and  VLSI-based  systems. 


